Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering
نویسندگان
چکیده
A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significantly impact upon the level of improvement. We demonstrate that it is often possible to correctly “guess” a large number of constraints without supervision from the coassociations between pairs of objects in an ensemble of clusterings. Along the same lines, we establish that constraints based on pairs with uncertain co-associations are particularly informative, if known. An evaluation on text data shows that this provides an effective criterion for identifying constraints, leading to a reduction in the level of supervision required to direct a clustering algorithm to an accurate solution.
منابع مشابه
Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملAn Ensemble Approach to Identifying Informative Constraints for Semi-Supervised Clustering
A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significant...
متن کاملOnline Active Constraint Selection For Semi-Supervised Clustering
Due to strong demand for the ability to enforce top-down structure on clustering results, semi-supervised clustering methods using pairwise constraints as side information have received increasing attention in recent years. However, most current methods are passive in the sense that the side information is provided beforehand and selected randomly. This may lead to the use of constraints that a...
متن کاملActive Query Selection and Spectral Eigenvectors Semi-Supervised Clustering
Semi-supervised clustering aims to improve clustering performance by considering user supervision in the form of pairwise constraints. In this paper, we study the active learning problem of selecting pairwise must-link and cannot-link constraints for semisupervised clustering. We consider active learning in an iterative manner where in each iteration queries are selected based on the current cl...
متن کامل